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(54) Text searching system 

(57) The invention relates to a text searching sys- 
tem (30,60) for searching web pages according to Icey- 
word and classification data (46,64) provided by a user. 
The text searching system (30,60) comprises a compu- 
ter having a memory (32) for storing programs and data 
and a processor (34) for executing the programs stored 
In the memory (32), a text data file (36) stoned in the 
memory (32) having text data (38) of web pages of a 
plurality of world wide web sites, a text index file (40) 
stored in the memory (32) having keyword searching 
data (42) for searching keywords contained in the text 
data (38) of each of the web pages of the text data file 
(36), a classfficatbn index file (44,62) stored in the 
memory (32) having classification data (46,64) corre- 
sponding to the classification (54) of each of the web 
pages of the text data file (36), and a searching program 
(48,66) stored in the conr^uter for searching the text 
Index file (40) and the classification index file (44,62) 
according to Keyword and classification data (46,64) 
provided by a user so as to find text data (38) which are 
matched with the user provided keyword data and con- 
tained in a plurality of target web pages whose classifi- 
cations (54) are matched with the user provided 
classification data (46,64) in the text data file (36). 
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Description 

[0001] TTie invent! n relates to a text searching sys- 
tem according t the pre-characterlzing portion of claim 
1. 

[0002] As the number of web pages on the internet 
increases, a searching system becomes necessary for 
searching the myriad of web pages for specific informa- 
tion. A corresponding prior art searching system com- 
prises a computer having a memory in which a text data 
file, a text index file comprising keyword searching data, 
and a searching prognEun are stored. Since the system 
uses a keyword for searching web pages, the text data 
of all the web pages containing the keyword are 
returned. Whereas this requires an excessive amount of 
transmission time, most of the transmitted web pages 
do not fit well into the classification provided by the user 
Therefore, additional search and transmission time 
must be spent For example, if the user wants to search 
for web pages of movies containing refsrences to "tor- 
nado", the searching system will transmit to the user the 
text data of all web pages containing the word tor- 
nado', l-lowever, these transmitted web pages will 
include irrelevant pages concerning unrelated topics 
such as meteorology, history and news. Therefore, 
more time must be spent manually selecting the pages 
that are actually pertinent. 

[0003] With these problems in mind, the present 
invention aims at providing a text searching system for 
searching web pages according to a keyword which, 
nevertheless, is more economic either for transmission 
time and consecutive manual selection. 
[0004] This is achieved by the present invention as 
claimed in claim 1 . The dependent claims define advan- 
tageous further developments of the respective inven- 
tion. 

[0005] In that, according to the invention, the sys- 
tem additionally includes a classification index file hav- 
ing dassification data, and a searching program for 
searching text data matching with user provided key- 
word data and user provided classification data, the 
search can be perfbmned In a much more defined man- 
ner, thus to avoid outputting of misrelated pages. 
[0006] In the folkiwing the invention is described in 
more detail, having reference to the accompanying 
drawings, in which 

Fig. 1 is a functional block diagram of a prior art 
searching system as mentioned above, 
Fig. 2 is a perspective diagram of the keyword 
searching data in the system of Rg. 1 . 
Fig. 3 is a functional block diagram of a text search- 
ing system according to the present invention, and 
Fig. 4 is a perspective diagram of another text 
searching system according to the pres nt inven- 
tion. 

[0007] The pri r art text searching system 10 



shown in Fig.1 comprises a computer (not shown), a 
text data file 1 6, a text index file 20. and a searching pro- 
gram 24. The conrrputer comprises a memory 12 for 
storing programs and data and a processor 14 for exe- 

5 cuting the programs stored in the m mory 12. The text 
data file 1 6. text index file 20, and searching program 24 
are stored in the memory 12. The text data file 16 has 
text data 1 8 of web pages of a plurality of worid wide 
web sites. The text index file 20 has keyword searching 

10 data 22 for searching keywords contained in the text 
data 1 8 of each of the web pages of the text data file 16. 
The searching program 24 is used for searching the text 
index file 20 according to keyword data provteted by a 
user so as to find text data 18 of all the web pages hav- 

15 ing the user provided keyword data In the text data file 
16. 

[0008] As can be seen in Fig. 2, the keyword 
searching data 22 of the text index file 20 are built 
according to the text data 18 of the text data file 16. 

20 Each set of keyword searching data 22 has a k^word 
21 and address data 23 of the keyword 21 In all web 
pages. As shown in Fig. 2, the address data of the key- 
word 'worid" in all web pages are a1, a2, a3...; the 
address data of the keyword **world wide wetf in all web 

25 pages are c1 , c2. c3.... When the user inputs a keywonj, 
the searching program 24 searches the text index ffle 20 
according to the keyword provkled by the user to find 
the keyword searching data 22 corresponding to the 
keyword for getting the address data of the keyword in 

30 all web pages. Rnally. the text data file 16 is used for 
transmitting to the user the text data 1 8 of all web pages 
having the keyword. 

[0009] As mentioned before, because the prior art 
searching system 10 uses a keyword for searching web 

35 pages, the text data of all web pages containing the key- 
word are retumed. This takes an excessive mount of 
time to transmit. In searching for the web pages within a 
specific clas8ifk:ation, the searching system 10 trans- 
mits the text data of all the web pages containing the 

40 keyword to the user but most of the transnfttted web 
pages are not well matched with the user provided das- 
siftoatton. Therefore, more search and transmisston 
time must be spent and, nevertheless, finally the pages 
actually pertinent have to be selected mani»flyL 

45 [001 0] A text searching system 30 according to the 
present Invention, as this is shown in Fig. 3, comprises 
a computer (not shown), a text data file 36. aloit index 
file 40, a dassification index file 44, and a searching 
program 46. The computer comprises a memory 32 for 

50 storing programs and data and a processor 34 for exe- 
cuting the programs stored in the memory 32. The text 
data file 36, text index file 40. class'rfkiation index file 44 
and searching program 48 are stored in the ntemory 32. 
The text data file 36 has text data 38 of web pages of a 

55 plurality of world wide web sites. Th text index file 40 
has keyword searching data 42 for searching keywonjs 
contained in the text data 38 of each of the web pages 
of the text data file 36. The classification index file 44 
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has classification data 46 corresponding to the classifi- 
cation of each of the web pages of the text data file 36. 
The searching program 46 is used for searching the text 
index file 40 and the classification Index file 44 accord- 
ing t keyword and classification data provid d by a 5 
user so as to find text data 38 which are matched with 
the user provided keyword data and contained in a plu- 
rality of target web pages whose classifications are 
matched with the user provided ciassificatton data in the 
text data file 36. w 
[0011] The keyword searching data 42 of the text 
index file 40 is built according to the text data 38 of the 
text data file 36. Each keyword searching data 42 has a 
keyword and address data of the keyword in all web 
pages. Each classification data 46 of the classification 75 
index file 44 has a plurality of classifications 54. and 
each classification 54 has web page data 50 of all the 
web pages belonging to the classification. Each web 
page data 50 comprises a keyword position indexing 
data 52 of the web page. The keyword position indexing 20 
data 52 Is used for pointing to the positions of the key- 
word searching data 42 of the speclfk: web page con- 
tained In the text Index file 40. 
[0012] When a user inputs keyword and classifica- 
tion data, the searching program 48 searches the etas- 25 
siftcation index file 44 according to the classification 
data provided to find the web page data 50 of all web 
pages belonging to the classification data. Then, the 
searching program 48 searches the position of the key- 
word searching data 42 of tiie text data 38 of each web 30 
page in ttie text index file 40 according to the keyword 
position indexing data 52 of the web page data 50. 
Then, the searching program 48 searches the keyword 
searching data 42 of all web pages belonging to the 
classification data in the text index file 40 according to as 
the keyword provided by the user to find the text data 38 
of all web pages which belong to the classification data 
and have the keyword. Finally, the text data file 36 is 
used for transmitting the text data 38 of ail web pages 
belonging to the classification data and having the key- 40 
word to the user. 

[0013] Fig.4 is a perspective diagram of another 
text searching system 60 according to the present 
invention. The classification index file 62 of tiie text 
searching system 60 contains tiie classification data 64 4s 
of tiie web pages of each keyword searching data 42 in 
the text index file 40. When a user Inputs keyword and 
classifrcation data, the searching program 66 searches - 
the text index file 40 according to the Keyword provided 
to find all the keyword searching data 42 matched with so 
the user provided keywoni data and the address data of 
the keyword In all the web pages. Then, tiie searching 
program 66 searches the classifcation index file 62 
according to the keyword searching data 42 to find the 
class'ifk»tion data 64 of tiie web page of each matched 55 
keyword searching data 42. The searching program 66 
finds all keyword searching data 42 belonging to the 
classifk^tion data according to the classification data 



provided by the user t find the text data 36 of all w b 
pages which belong to the classification data and have 
ttie keyw rd. Finally, th text data file 36 is used f r 
transmitting tiie text data 38 f ail web pages bel nging 
to the classification data and having the keyword to the 
user. 

[0014] The text searching system 30 uses the clas- 
sifk:ation index file 44 to find all web pages belonging to 
tiie classifk:ation data provkled by tiie user, and then 
uses the text index file 40 and the keyword provided by 
tiie user to find all the web pages belonging to the clas- 
sification data and having the keyword. The text search- 
ing system 60 uses the text index file 40 to find all web 
pages having the keyword provided by the user, and 
tiien uses tiie classification index file 62 and the classi- 
fication data provided by the user to find all the web 
pages belonging to tiie classification data and having 
ttie keyword. 

[0015] Compared with the prior art searching sys- 
tem 10, the text searching systems 30, 60 according to 
tiie present Invention use keyword and classlfk:ation 
data provided by the user and finds all the web pages 
that belong to tiie classification data and have the key- 
word. The text searching systems 30, 60 transmit only 
the text data of all the web pages belonging to the cias- 
sifcation data and having the keyword to the user. 
Therefore, ttie searching and transmission time Is 
greatly reduced and the text searching system is more 
efficient 

Claims 

1 . A text searching system (30, 60) comprising: 

a computer having a memory (32) for storing 
programs and data and a processor (34) for 
executing the programs stored in the memory 
(32); 

a text data file (36) stored in the memory (32) 
having text data (38) of web pages of a plurality 
of world wide web sites; and 
a text index file (40) stored In the memory (32) 
having keyword searching data (42) for search- 
ing keywords contained In the text data (38) of 
each of the web pages of tiie text data file (36); 
characterized In that: 

the text searching system (30.60) further com- 
prises: 

a dass'ification index file (44.62) stored in the 
memory (32) having classification data (46,64) 
corresponding to the classification (54) of each 
of the web pages of the text data file (36); and 
a searching program (48.66) stored in tiie com- 
put r for searching the text index file (40) and 
the classification index file (44,62) according to 
keyw rd and class'ifk^ation data (46,64) pro- 
vided by a user. so as to find text data (38) 
which are matched with the user provid d key- 
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word data and contained in a plurality of target 
web pages whose classificalions (54) are 
matched with th user provided ciassificati n . 
data (46.64) in the text data file (36). 

5 

2. The text searching system (30) of dalm 1 wherein 
the classification index file (44) contains a plurality 
of classifications (54) and web page data (50) of all 
the web pages belonging to each of the classifica- 
tions (54). and wherein the searching program (48) io 
searches the classification index file (44) to find all 
the target web pages whose classifications (54) are 
matched the user provided classification data (46). 
and then searches the text index file (40) to find text 
data (38) which are matched with the user provided is 
keyword data and contained in the target web 
pages of the text data file (36). 

3. The text searching system (30) of daim 2 wherein 

the web page data (50) of each specific web page 20 
In the classification Index file (44) contain keyword 
position indexing data (52) for pointing the positions 
of the keyword searching data (42) of the specific 
web page contained In the text Index file (40), and 
wherein the searching program (48) searches the 2s 
classification index file (44) to find the positions of 
the keyword searching data (42) of the target web 
pages in the text index file (40), and then searches 
the keyword searching data (42) of the target web 
pages to find the text data (38) whk:h are matched ao 
with the user provided keyword data and contained 
in the target web pages of the text data file (36). 

4. The text searching system (60) of daim 1 wherein 

the dassification index file (62) contains the ctassi- 3S 
fication of the web page of each keyword searching 
data (42) in the text index file (40), and wherein the 
searching progreun (66) searches the text index file 
(40) to find all the keyword searching data (42) 
matched with the user provided keyword data, and 40 
then searches the classification index file (62) to 
find the classification of the web page of each 
matched keyword searching data (42) so as to 
locate thie keyword searching data (42) of the target 
web pages, and finally finds the text data (38) con- 45 
tained in the text data file (36) using the keyword 
searching data (42) of the target web pages. 



55 



EP 1 056 024 A1 



10 



Processor 



Memory 
Searching program 



14 



•12 



L-24 



18^, 



Text data file 



Text data 



16 



22- 



Text index file 



Keyword searching 
data 



-20 



Fig. 1 Prior art 



5 



EP1 0S6024A1 



22 



21 



23 



World 


al, a2, a3 


World wide 


bl,b2,b3 


World wide web 


cl , c2, c3 











Fig. 2 Prior art 



6 



EP1 056 024A1 



42- 



46 

54- 
50 

52 



Processor 



34 



-32 



Memory 



Searching program 



38-. . 



Text data file 



Text data 



-48 
-36 



Text index file 



Keyword searching 
data 



Classification index file 



Classification data 



Classification 



Web page data 



Keyword position 
indexing data 



Fig. 3 



EP 1056 024 A1 



Processor 



60 



34 



-32 



Memory 
Searching program 



-66 



Textdata file 



Texjudata. 



Text mdex file 



Keywordsearching 
data 



Classification index 
file 



Classification data 



Fig. 4 



8 



EP1 056024A1 



EUROPEAN SEARCH REPOm- 



CP 99 10 9330 



DOCUMENTS 00N8BERED TO BE RELEVANT 



W 98 09229 A (TELEVITESSE SVSTDIS IK 
;STREATCH PAUL (CA); REED JIN (CA)) 
5 Nareh 1998 (1998-09-OS) 

• tht «ho1« docu— nt « 

ANONYHOUS: "Ttxonoiilzed Ueb Search" 
in TECMICAL DISCLOSURE BULL£TIi, 
vol. 40. no. 5. 1 Nay 1997 (1997-01-01); 
pages 195-196, XP002133594 
MM York, US 

• the MlMl« dociMwnt • 

HEARST N A ET AL: 'CAT-A-COIE: AN 
IRTERACTIVE IRTERFACE FOR SPECIFYINa 
SEARCHED AND VIEHIII6 RCTRIEVAL RESULTS 
USIH6 A LAR6E CATEGORY HIERARCHY' 
ANNUAL INTERNATIONAL ACN-SI6IR CONFERENCE 
ON RESEARCH AND DEVELOPNENT IN INFORMATION 
RETRIEVAL.US.NEV YORK, NY: ACH,1997, pages 
246-2SS, IP000782010 ISBN: 0-89791-S30-3 

• page 2S1. colum 1, lino 7 - page 251, 
coluM 2. line 28 * 

• page 2S2, coluan 1, line 38 - page 253, 
colon 1, line 42 • 



1.2,4 



1-4 



006F17/30 



1-4 



fi06F 



OMiCtMlMnOaixMM 

21 Narch 2000 



THE HA6UE 



Abblng. R 



cwEQOwy OF cngp oocumewtb 





9 



EP 1056 024 A1 



ANNDCTOTMB eUROPIANaCAMN MPOfir 

ON EUROPEAN PAIBirAPPUCAllON NO. EP 99 10 9330 




HO 9809229 h 0SH»-1998 CA 2184518 A 01-03-1998 

AU 4007497 A 19-03-1998 
EP 0922280 A 18-08-1999 



10 



